{
 "cells": [
  {
   "cell_type": "raw",
   "id": "5a5cdcae",
   "metadata": {},
   "source": [
    "1.一定看一下数据是否存在缺失值\n",
    "2.如果所有缺失值看下题目有没有要求处理，如果题目有要求进行处理。那么就按照题目的方式处理，\n",
    "        有些时候题目并没有完全把缺失值处理干净。这个时候要把缺失值处理干净。方式不限。然后用注释说明为什么要进行处理。\n",
    "       有些时候题目没有要求处理，但是又有缺失值，这个时候也一定处理掉。方式不限。然后用注释说明为什么要进行处理。"
   ]
  },
  {
   "cell_type": "raw",
   "id": "ab7d1d40",
   "metadata": {},
   "source": [
    "标准的缺失值：NaN\n",
    "通常使用numpy.nan表示缺失值"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "ee5f2186",
   "metadata": {},
   "source": [
    "1.删除"
   ]
  },
  {
   "cell_type": "raw",
   "id": "70772264",
   "metadata": {},
   "source": [
    "删除:\n",
    "  按行删除：如如果一列的缺失值没有超过20%则可以考虑将缺失值所在的行删除掉\n",
    "  按列删除:80%法则，也就是一列中的非缺失部分要不低于80%如果低于80%则可以考虑删除该列。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "b96ef444",
   "metadata": {},
   "outputs": [],
   "source": [
    "import pandas as pd\n",
    "import numpy as np"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "3046a629",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 2,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_trains.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "505d11f3",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId     0.000000\n",
       "Survived        0.000000\n",
       "Pclass          0.000000\n",
       "Name            0.000000\n",
       "Sex             0.000000\n",
       "Age            19.865320\n",
       "SibSp           0.000000\n",
       "Parch           0.000000\n",
       "Ticket          0.000000\n",
       "Fare            0.000000\n",
       "Cabin          77.104377\n",
       "Embarked        0.224467\n",
       "dtype: float64"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull().sum()/df.shape[0]*100"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "86bdcfd5",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 按列删除\n",
    "df1 =df.dropna(axis=1,thresh=df.shape[0]*0.8)   \n",
    "# axis表示按行还是按列删除 1表示按列删除   0表示按行删除\n",
    "# thresh：阈值；也就是非缺失部分的占比\n",
    "# how:什么情况下删除，any表示存在任意一个缺失值就删除，all表示的全部都是缺失值时才删除"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "73f158e9",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId      0\n",
       "Survived         0\n",
       "Pclass           0\n",
       "Name             0\n",
       "Sex              0\n",
       "Age            177\n",
       "SibSp            0\n",
       "Parch            0\n",
       "Ticket           0\n",
       "Fare             0\n",
       "Embarked         2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 5,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df1.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "4a88138b",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 按行删除\n",
    "df2 = df1.dropna(axis=0,how='any')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "58793c30",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId    0\n",
       "Survived       0\n",
       "Pclass         0\n",
       "Name           0\n",
       "Sex            0\n",
       "Age            0\n",
       "SibSp          0\n",
       "Parch          0\n",
       "Ticket         0\n",
       "Fare           0\n",
       "Embarked       0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "70e21b89",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "(712, 11)"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df2.shape"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "97e07378",
   "metadata": {},
   "source": [
    "2.填充"
   ]
  },
  {
   "cell_type": "raw",
   "id": "dfbd8ab6",
   "metadata": {},
   "source": [
    "填充缺失值：使用一些特殊值来对缺失值进行替换，如众数、均值、中位数\n",
    "数值类型的数据可以使用众数、均值、中位数填充都可以，但是字符串(object)类型的数据只能使用众数填充。一列中的众数可能存在多个,需要选择那个进行填充\n",
    "    1.fillna：使用值进行填充\n",
    "    2.replace：替换，一次只能填充一列数据\n",
    "    3.simpleimputer：填充，可以同时填充同类型的多列数据\n",
    "    4.knnimputer：填充，可以同时多列数据，只能填充数值类型的数据"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "9926f557",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>PassengerId</th>\n",
       "      <th>Survived</th>\n",
       "      <th>Pclass</th>\n",
       "      <th>Name</th>\n",
       "      <th>Sex</th>\n",
       "      <th>Age</th>\n",
       "      <th>SibSp</th>\n",
       "      <th>Parch</th>\n",
       "      <th>Ticket</th>\n",
       "      <th>Fare</th>\n",
       "      <th>Cabin</th>\n",
       "      <th>Embarked</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Braund, Mr. Owen Harris</td>\n",
       "      <td>male</td>\n",
       "      <td>22.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>A/5 21171</td>\n",
       "      <td>7.2500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Cumings, Mrs. John Bradley (Florence Briggs Th...</td>\n",
       "      <td>female</td>\n",
       "      <td>38.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>PC 17599</td>\n",
       "      <td>71.2833</td>\n",
       "      <td>C85</td>\n",
       "      <td>C</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>3</td>\n",
       "      <td>1</td>\n",
       "      <td>3</td>\n",
       "      <td>Heikkinen, Miss. Laina</td>\n",
       "      <td>female</td>\n",
       "      <td>26.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>STON/O2. 3101282</td>\n",
       "      <td>7.9250</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>4</td>\n",
       "      <td>1</td>\n",
       "      <td>1</td>\n",
       "      <td>Futrelle, Mrs. Jacques Heath (Lily May Peel)</td>\n",
       "      <td>female</td>\n",
       "      <td>35.0</td>\n",
       "      <td>1</td>\n",
       "      <td>0</td>\n",
       "      <td>113803</td>\n",
       "      <td>53.1000</td>\n",
       "      <td>C123</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>5</td>\n",
       "      <td>0</td>\n",
       "      <td>3</td>\n",
       "      <td>Allen, Mr. William Henry</td>\n",
       "      <td>male</td>\n",
       "      <td>35.0</td>\n",
       "      <td>0</td>\n",
       "      <td>0</td>\n",
       "      <td>373450</td>\n",
       "      <td>8.0500</td>\n",
       "      <td>NaN</td>\n",
       "      <td>S</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "   PassengerId  Survived  Pclass  \\\n",
       "0            1         0       3   \n",
       "1            2         1       1   \n",
       "2            3         1       3   \n",
       "3            4         1       1   \n",
       "4            5         0       3   \n",
       "\n",
       "                                                Name     Sex   Age  SibSp  \\\n",
       "0                            Braund, Mr. Owen Harris    male  22.0      1   \n",
       "1  Cumings, Mrs. John Bradley (Florence Briggs Th...  female  38.0      1   \n",
       "2                             Heikkinen, Miss. Laina  female  26.0      0   \n",
       "3       Futrelle, Mrs. Jacques Heath (Lily May Peel)  female  35.0      1   \n",
       "4                           Allen, Mr. William Henry    male  35.0      0   \n",
       "\n",
       "   Parch            Ticket     Fare Cabin Embarked  \n",
       "0      0         A/5 21171   7.2500   NaN        S  \n",
       "1      0          PC 17599  71.2833   C85        C  \n",
       "2      0  STON/O2. 3101282   7.9250   NaN        S  \n",
       "3      0            113803  53.1000  C123        S  \n",
       "4      0            373450   8.0500   NaN        S  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df = pd.read_csv('./titanic_trains.csv')\n",
    "df.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "898e2a6c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 按列删除\n",
    "df =df.dropna(axis=1,thresh=df.shape[0]*0.8)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "d9d4353a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId      0\n",
       "Survived         0\n",
       "Pclass           0\n",
       "Name             0\n",
       "Sex              0\n",
       "Age            177\n",
       "SibSp            0\n",
       "Parch            0\n",
       "Ticket           0\n",
       "Fare             0\n",
       "Embarked         2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 11,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "29674024",
   "metadata": {},
   "source": [
    "2.1 fillna"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "da8c04da",
   "metadata": {},
   "outputs": [],
   "source": [
    "df3 = df1.fillna(value=0)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "87da292c",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId    0\n",
       "Survived       0\n",
       "Pclass         0\n",
       "Name           0\n",
       "Sex            0\n",
       "Age            0\n",
       "SibSp          0\n",
       "Parch          0\n",
       "Ticket         0\n",
       "Fare           0\n",
       "Embarked       0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df3.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9f5beeae",
   "metadata": {},
   "source": [
    "2.2 replace"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "0479713a",
   "metadata": {},
   "outputs": [],
   "source": [
    "df4 = df1.copy()\n",
    "df4['Age'] = df4['Age'].replace(np.nan,df4['Age'].mean()) # 使用均值"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "1d23edad",
   "metadata": {},
   "outputs": [],
   "source": [
    "df4['Embarked'] = df4['Embarked'].replace(np.nan,df4['Embarked'].mode()[0])  # 使用众数"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "2e6c3af7",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId    0\n",
       "Survived       0\n",
       "Pclass         0\n",
       "Name           0\n",
       "Sex            0\n",
       "Age            0\n",
       "SibSp          0\n",
       "Parch          0\n",
       "Ticket         0\n",
       "Fare           0\n",
       "Embarked       0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df4.isnull().sum()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57f1a1cf",
   "metadata": {},
   "source": [
    "2.3 simpleimputer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "08684a9b",
   "metadata": {},
   "outputs": [],
   "source": [
    "df5 = df1.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "b19830e8",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.impute import SimpleImputer\n",
    "\n",
    "# 定义规则\n",
    "model_si = SimpleImputer(missing_values=np.nan,strategy='mean')\n",
    "# 将规则应用到数据上\n",
    "df_mean = model_si.fit_transform(df5[['Age']])\n",
    "# 转型，将array类型转化为dataframe\n",
    "df_mean = pd.DataFrame(data=df_mean,columns=['Age'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "52e6b92c",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用新的(没有缺失值的数据)替换到原本有缺失值的数据\n",
    "df5['Age'] = df_mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "60276c9d",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId    0\n",
       "Survived       0\n",
       "Pclass         0\n",
       "Name           0\n",
       "Sex            0\n",
       "Age            0\n",
       "SibSp          0\n",
       "Parch          0\n",
       "Ticket         0\n",
       "Fare           0\n",
       "Embarked       2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df5.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "5bac8547",
   "metadata": {},
   "outputs": [],
   "source": [
    "model_si = SimpleImputer(missing_values=np.nan,strategy='most_frequent')\n",
    "df_mean = model_si.fit_transform(df5[['Embarked']])\n",
    "df_mean = pd.DataFrame(data=df_mean,columns=['Embarked'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "abfe0b55",
   "metadata": {},
   "outputs": [],
   "source": [
    "df5['Embarked'] = df_mean"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "d44a599c",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "<class 'pandas.core.frame.DataFrame'>\n",
      "RangeIndex: 891 entries, 0 to 890\n",
      "Data columns (total 1 columns):\n",
      " #   Column    Non-Null Count  Dtype \n",
      "---  ------    --------------  ----- \n",
      " 0   Embarked  891 non-null    object\n",
      "dtypes: object(1)\n",
      "memory usage: 7.1+ KB\n"
     ]
    }
   ],
   "source": [
    "df_mean.info()"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d5abbaec",
   "metadata": {},
   "source": [
    "2.4knnimputer"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "id": "a6032ee6",
   "metadata": {},
   "outputs": [],
   "source": [
    "df6 = df1.copy()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "id": "c3dedc3c",
   "metadata": {},
   "outputs": [],
   "source": [
    "from sklearn.impute import KNNImputer\n",
    "\n",
    "#  定义规则\n",
    "mode_knn = KNNImputer()\n",
    "# 将规则应用到数据上\n",
    "df_knn = mode_knn.fit_transform(df6[[col for col in df6.columns if df[col].dtype!= 'object']])\n",
    "# 转型\n",
    "df_knn = pd.DataFrame(data=df_knn,columns=[col for col in df6.columns if df[col].dtype!= 'object'])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "id": "ef6e4ee1",
   "metadata": {},
   "outputs": [],
   "source": [
    "# 使用新的(没有缺失值的数据)替换到原本有缺失值的数据\n",
    "df6[df_knn.columns] = df_knn"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "id": "1de33dfb",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId    0\n",
       "Survived       0\n",
       "Pclass         0\n",
       "Name           0\n",
       "Sex            0\n",
       "Age            0\n",
       "SibSp          0\n",
       "Parch          0\n",
       "Ticket         0\n",
       "Fare           0\n",
       "Embarked       2\n",
       "dtype: int64"
      ]
     },
     "execution_count": 27,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df6.isnull().sum()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "a403b318",
   "metadata": {},
   "outputs": [],
   "source": [
    "df7 = df6.dropna(axis=0,how='any')"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "id": "cba521d8",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "PassengerId    0\n",
       "Survived       0\n",
       "Pclass         0\n",
       "Name           0\n",
       "Sex            0\n",
       "Age            0\n",
       "SibSp          0\n",
       "Parch          0\n",
       "Ticket         0\n",
       "Fare           0\n",
       "Embarked       0\n",
       "dtype: int64"
      ]
     },
     "execution_count": 29,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "df7.isnull().sum()"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.9"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
